Boosting First-Order Clauses for Large, Skewed Data Sets

نویسندگان

  • Louis Oliphant
  • Elizabeth S. Burnside
  • Jude W. Shavlik
چکیده

Creating an e ective ensemble of clauses for large, skewed data sets requires nding a diverse, high-scoring set of clauses and then combining them in such a way as to maximize predictive performance. We have adapted the RankBoost algorithm in order to maximize area under the recall-precision curve, a much better metric when working with highly skewed data sets than ROC curves. We have also explored a range of possibilities for the weak hypotheses used by our modi ed RankBoost algorithm beyond using individual clauses. We provide results on four large, skewed data sets showing that our modi ed RankBoost algorithm outperforms the original on area under the recall-precision curves.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting interval based literals

A supervised classification method for time series, even multivariate, is presented. It is based on boosting very simple classifiers: clauses with one literal in the body. The background predicates are based on temporal intervals. Two types of predicates are used: i) relative predicates, such as “increases” and “stays”, and ii) region predicates, such as “always” and “sometime”, which operate o...

متن کامل

Rewrite-Based Equational Theorem Proving with Selection and Simplification

We present various refutationally complete calculi for first-order clauses with equality that allow for arbitrary selection of negative atoms in clauses. Refutation completeness is established via the use of well-founded orderings on clauses for defining a Herbrand model for a consistent set of clauses. We also formulate an abstract notion of redundancy and show that the deletion of redundant c...

متن کامل

Time Series Classification by Boosting Interval Based Literals

A supervised classification method for temporal series, even multivariate, is presented. It is based on boosting very simple classifiers: clauses with one literal in the body. The background predicates are based on temporal intervals. Two types of predicates are used: i) relative predicates, such as “increases” and “stays”, and ii) region predicates, such as “always” and “sometime”, which opera...

متن کامل

Asociación Española Para La Inteligencia Artificial España Time Series Classification by Boosting Interval Based Literals *

A supervised classification method for temporal series, even multivariate, is presented. It is based on boosting very simple classifiers: clauses with one literal in the body. The background predicates are based on temporal intervals. Two types of predicates are used: i) relative predicates, such as “increases” and “stays”, and ii) region predicates, such as “always” and “sometime”, which opera...

متن کامل

Boosting Descriptive ILP for Predictive Learning

Inductive Logic Programming has been very successful in application to multirelational predictive tasks. Sophisticated predictive ILP systems, such as Progol and foil, can achieve high predictive accuracy, while the learning results remain understandable. Although boosting [1] is an established method to promote predictive accuracy of weak algorithms, there have been relatively few efforts to a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009